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JAPANESE VIRTUAL DICTIONARY 



FIELD OF THE INVENTION 



[0001] The present invention relates generally to the field of electronic text 
entry, and more particularly to a method of entering Japanese hiragana characters 
and translating into appropriate Japanese words using a combination of hiragana, 
katakana and kanji characters. 



BACKGROUND OF THE INVENTION 



[0002] The Japanese written language contains three separate character strings. 
Simple Japanese characters representing phonetic syllables are represented by the 
hiragana and katakana character sets (together referred to as "kana"). Hiragana 
characters, which are characterized by a cursive style, are typically used for words 
native to Japan. Katakana characters, which are characterized by a more angular 
style, are typically used for words borrowed from other cultures, or for emphasis 
and sound effects. The third character set in Japanese is kanji. Kanji are the 
complex Japanese characters borrowed from the Chinese language. There are over 
9000 kanji characters in the Japanese language. Approximately 4000 kanji are used 
on a semi-regular basis, while knowledge of 2000 kanji is generally required to read 
a newspaper or get around in Japan. The complexity of the Japanese written 
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language poses several challenges for efficient text entry in computers, word 
processors, and other electronic devices. 

[0003] Figure 1 A shows an example of Japanese hiragana and katakana 
characters. The hiragana 151 and katakana 152 character sets each contain 46 base 
characters. Both sets of kana have identical pronunciations and rules of 
construction, only the shapes of the characters are different to emphasize the 
different usage of the words. Some base kana characters are used in certain 
combinations and in conjunction with special symbols (called "nigori" and "maru") 
to produce voiced and aspirated variations of the basic syllables, thus resulting in a 
full character set for representing the approximately 120 different Japanese phonetic 
sounds. If a Japanese keyboard included separate keys for all of the voiced and 
aspirated variants of the basic syllables, the keyboard would need to contain at least 
80 character keys. Such a large number of keys create a crowded keyboard with 
keys, which are often not easily discernible. If the nigori and maru symbol keys are 
included separately, the number of character keys can be reduced to 57 keys. 
However, to generate voiced or aspirated versions of a base character requires the 
user to enter two or more keystrokes for a single character. 
[0004] Common methods of Japanese text entry for computers and like devices 
typically require the use of a standard Japanese character keyboard or a roman 
character keyboard, which has been adapted for Japanese use. A typical kana 
keyboard has keys which represent typically only one kana set (usually hiragana) 
which may be input directly from the keyboard. A conventional method is to take 
the hiragana text from the keyboard containing the hiragana keys as an input, and 
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convert it into a Japanese text using a process called Kana-Kanji conversion. A 
typical Japanese text is represented by hiragana, katakana and kanji characters, such 
as sentence 150, which has English meaning of "Watch a movie in San Jose". The 
text 150 includes katakana characters 154 which are corresponding to a foreign 
word of "San Jose", a hiragana character 155 that is normally used as a particle, and 
a kanji character set 153. 

[0005] Figure IB shows a conventional method of converting a hiragana text to 
a Japanese text. Referring to Figure 1, the Japanese hiragana characters are entered 
101 through a keyboard. The hiragana characters are converted 102 to Japanese 
texts by looking up characters in a database (e.g., dictionary). Then the user has to 
inspect 103 and check 104 whether the conversion is correct. If the conversion is 
incorrect (e.g., the dictionary does not contain such conversion), the user has to 
manually force the system to convert the hiragana text. A typical user interaction 
involves selecting 105 portions of the hiragana texts, which are converted 
incorrectly and explicitly instructing 106 the system to convert such portion. The 
system then presents 107 a candidate list including all possible choices. The user 
normally checks 109 whether the conversion is correct. If the conversion is correct, 
the user then selects 108 a choice as its best output and inserts the correct result to 
form the final output text. If the conversion is incorrect, the user reselects a 
different portion of the input and tries to manually convert the reselected portion 
again. 

[0006] One of the conventional methods, transliteration (direct conversion from 
hiragana to katakana) normally does not provide a correct result for most of the 
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cases, because typically users choose (e.g., in a method shown in Figure IB), 
instead of the katakana word, a segment containing the word and one or more 
trailing post particles that are written in hiragana in the final form. The normal 
transliteration will also convert all trailing post particles to katakana form which is 
incorrect. 

[0007] Another conventional method generates alternative candidates by 
transliterating the leading sub-string of the string. This method takes advantage of 
the fact that the trailing particles are always trailing and are all in hiragana. This 
method creates many candidates that may include the correct one among them. 
Following is an illustration of an example of the a conventional method (in 
English): 



input: 


inthehouse 


output 1 : 


INTHEHOUSE 


output 2: 


i NTHEHOUSE 


output 3 : 


in THEHOUSE 


output 4: 


int HEHOUSE 


output 5: 


inthEHOUSE 


output 6: 


inthe HOUSE - (correct one) 


output 7: 


intheh OUSE 


output 8: 


intheho USE 


output 9: 


inthehou SE 


output 10: 


inthehous E 


output 1 1 : 


inthehouse 



As described above, the conventional method generates many candidates after the 
user selects a potion of the input text to be corrected, which may lead to confusion 
of the final selection, even though such candidates may include a correct choice. 
Another conventional method involves an analyzer, which can recognize post 
particles. It analyzes the range from the end until the analyzer cannot find post 
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particles any more. However, the conventional methods require a user to interact 
thereby potentially lower efficiency in order to achieve accurate results. 
[0008] One of the disadvantages of the conventional method is that if a 
Katakana word is not in the dictionary, the conversion containing the Katakana 
word usually fails. Another disadvantage of this method is that it involves user- 
specific interaction to convert and select the best candidate. It consumes more time 
and efforts if the user does not know the possible outputs of the conversion. Hence, 
a better method to automatically and efficiently convert Japanese hiragana character 
string to katakana character string is highly desirable. 
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SUMMARY OF THE INVENTION 



[0009] The present invention discloses methods and apparatuses for converting 
a first character string to a second character string. In addition to a regular 
dictionary, the invention includes a virtual dictionary to generate an artificial 
character string based on the first character string. When the first character string 
cannot be converted through a regular dictionary (e.g., the regular dictionary does 
not know the first character string), the invention uses the artificial character string 
generated by the virtual dictionary to convert the first character string. Therefore, 
with the virtual dictionary of the invention, the conversion never fails. 
[0010] In one aspect of the invention, an exemplary method includes receiving a 
hiragana input, automatically determining a plurality of possible katakana 
candidates based on the hiragana input, analyzing the plurality of possible katakana 
candidates to convert the hiragana input to katakana characters, selecting one of the 
katakana candidates, and outputting converted text comprising the one of the 
katakana candidates and, at least in some cases, kanji characters, 
[0011] In another aspect of the invention, an exemplary method includes 
receiving a first character string having the source character string, dividing the first 
character string into a plurality of sub-strings, converting the plurality of the sub- 
strings to second character strings through a dictionary, creating third character 
strings corresponding to the plurality of the sub-strings, analyzing the second and 
third character strings, constructing fourth character strings from the second and 
third character strings based on the analysis, creating a candidate list based on the 
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fourth character strings, selecting the target character string from the candidate list 
and outputting the target character string. 

[0012] In one particular exemplary embodiment, the method includes 
constructing the fourth character strings from the second character strings, if the 
second character strings contain a character string corresponding to the first 
character string, and constructing the fourth character strings from the third 
character strings if the second character strings do not contain the character string 
corresponding to the first character sting. In another embodiment, the method 
includes examining the output of the conversion to determine whether the 
conversion is correct, providing the candidate list of alternative character strings if 
the conversion is incorrect, and selecting a character string from the candidate list as 
a final output. In a further embodiment, the method includes providing an artificial 
target character string and updating the database based on the artificially created 
character string. 

[0013] The present invention includes apparatuses which perform these 
methods, and machine readable media which when executed on a data processing 
system, causes the system to perform these methods. Other features of the present 
invention will be apparent from the accompanying drawings and from the detailed 
description which follows. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



[0014] The present invention is illustrated by way of example and not limitation 
in the figures of the accompanying drawings in which like references indicate 
similar elements. 

[0015] Figure 1 A shows examples of Japanese characters including hiragana, 
katakana, and kanji characters. 

[0016] Figure IB shows a conventional method of converting a hiragana text to 
a Japanese text. 

[0017] Figure 2 shows a computer system which may be used with the present 
invention. 

[001 8] Figure 3 shows one embodiment of the kana-kanji conversion system of 
the present invention. 

[0019] Figure 4 shows an example of calculation of cost values of the katakana 
character set used by one embodiment of the invention. 
[0020] Figure 5 shows another embodiment of the kana-kanji conversion 
system with user interaction of the present invention. 

[0021] Figure 6A shows an embodiment of conversion processes from hiragana 
character set to katakana character set of the invention. 

[0022] Figure 6B shows an illustration of an example of the invention versus a 
process of a conventional method. 

[0023] Figure 7 shows a method of converting hiragana characters to katakana 
characters of the invention. 
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[0024] Figure 8 shows another embodiment of conversion processes from 
hiragana character set to katakana character set of the invention. 
[0025] Figures 9A and 9B show another method of converting hiragana 
characters to katakana characters of the invention. 
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DETAILED DESCRIPTION 



[0026] The following description and drawings are illustrative of the invention 
and are not to be construed as limiting the invention. Numerous specific details are 
described to provide a thorough understanding of the present invention. However, 
in certain instances, well-known or conventional details are not described in order to 
not unnecessarily obscure the present invention in detail. 

[0027] Japanese is written with kanji (characters of Chinese origin) and two sets 
of phonetic kana symbols, hiragana and katakana. A single kanji character may 
contain one symbol or several symbols, and may, by itself, represent an entire word 
or object. Unlike kanji, kana have no intrinsic meaning unless combined with other 
kana or kanji to form words. Both hiragana and katakana contain 46 symbols each. 
Combinations and variations of the kana characters provide the basis for all of the 
phonetic sounds present in the Japanese language. All Japanese text can be written 
in hiragana or katakana. However, since there is no space between the words in 
Japanese, it is inconvenient to read a sentence when the words of the sentence are 
constructed by either hiragana or katakana only. Therefore most of the Japanese 
texts include hiragana, katakana and kanji characters. Normally, kanji characters 
are used as nouns, adjectives or verbs, while hiragana and katakana are used for 
particles (e.g., "of, "at", etc.). 

[0028] As computerized word processors have been greatly improved, the 
Japanese word processing can be implemented through a word processing software. 
Typical Japanese characters are inputted as hiragana only because it is impractical to 
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include all of hiragana, katakana and kanji characters (kana-kanji) in a keyboard. 
Therefore, there is a lot of interest to create an improved method of converting 
hiragana characters to katakana characters. The present invention introduces a 
unique method to convert hiragana characters to katakana characters automatically 
based on the predetermined relationships between hiragana characters and katakana 
characters. The methods are normally performed by software executed in a 
computer system. 

[0029] Figure 2 shows one example of a typical computer system, which may 
be used with the present invention. Note that while Figure 2 illustrates various 
components of a computer system, it is not intended to represent any particular 
architecture or manner of interconnecting the components as such details are not 
germane to the present invention. It will also be appreciated that network 
computers and other data processing systems (e.g., a personal digital assistant), 
which have fewer components or perhaps more components, may also be used with 
the present invention. The computer system of Figure 2 may, for example, be an 
Apple Macintosh computer or a personal digital assistant (PDA). 
[0030] As shown in Figure 2, the computer system 200, which is a form of a 
data processing system, includes a bus 202 which is coupled to a microprocessor 
203 and a ROM 207 and volatile RAM 205 and a non-volatile memory 206. The 
microprocessor 203, which maybe a G3 or G4 microprocessor from Motorola, Inc. 
or IBM is coupled to cache memory 204 as shown in the example of Figure 2. The 
bus 202 interconnects these various components together and also interconnects 
these components 203, 207, 205, and 206 to a display controller and display device 
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208 and to peripheral devices such as input/output (I/O) devices which may be 
mice, keyboards, modems, network interfaces, printers and other devices which are 
well known in the art. Typically, the input/output devices 210 are coupled to the 
system through input/output controllers 209. The volatile RAM 205 is typically 
implemented as dynamic RAM (DRAM) which requires power continually in order 
to refresh or maintain the data in the memory. The non-volatile memory 206 is 
typically a magnetic hard drive or a magnetic optical drive or an optical drive or a 
DVD RAM or other type of memory systems which maintain data even after power 
is removed from the system. Typically, the non- volatile memory will also be a 
random access memory although this is not required. While Figure 2 shows that 
the non- volatile memory is a local device coupled directly to the rest of the 
components in the data processing system, it will be appreciated that the present 
invention may utilize a non-volatile memory which is remote from the system, such 
as a network storage device which is coupled to the data processing system through 
a network interface such as a modem or Ethernet interface. The bus 202 may 
include one or more buses connected to each other through various bridges, 
controllers and/or adapters as are well known in the art. In one embodiment the I/O 
controller 209 includes a USB (Universal Serial Bus) adapter for controlling USB 
peripherals. 

[0031] Figure 3 shows a system used by an embodiment of the invention. 
Referring to Figure 3, the system 300 typically includes an input unit 301, an input 
method UI, and system interface 302, a morphological analysis engine (MAE) 303, a 
dictionary management module (DMM) 305 and an output unit 308. The input unit 
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301 may be a keyboard such as I/O device 210 of Figure 2. The input unit may be a 
touch pad, such as a personal digital assistant (PDA). The input unit may be a set of 
application programming interfaces (APIs) that receive inputs from an application. 
Other types of inputs may exist. The input unit 301 accepts Japanese characters 
inputted (e.g., Japanese hiragana characters). The hiragana characters are 
transmitted to the input method and interfaces 302, which in turn transmits to the 
MAE 303. The MAE 303 then accesses to a databases, such as regular dictionaries 
307 and virtual dictionary 306, through DMM 305. The regular dictionaries 307 
may include most known Japanese words corresponding to the hiragana words. The 
regular dictionaries 307 may be stored in a random access memory (RAM), such as 
volatile RAM 205, or it may be stored in a hard disk, such as nonvolatile memory 
206. In one embodiment, the regular dictionaries 307 may be stored in a remote 
storage location (e.g., network storage), through a network. It is useful to note that 
the present invention maybe implemented in a network computing environment, 
wherein the regular dictionaries may be stored in a server and an application 
executed in a client accesses to the regular dictionaries through a network interface 
over a network. Multiple applications executed at multiple clients may access the 
regular dictionaries simultaneously and share the information of the regular 
dictionaries over the network. Although the regular dictionaries 307 are illustrated 
as single dictionary, it would be appreciated that the regular dictionaries 307 may 
comprise multiple dictionaries or databases. In another embodiment, the regular 
dictionaries 307 may comprise multiple look-up tables. The virtual dictionary 306 
may direct convert every single hiragana character to a katakana character. The 
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virtual dictionary may contain a look-up table to look up every single katakana 
character for each hiragana character. The DMM 305 is responsible for managing 
all dictionaries including dictionaries 306 and 307. DMM 305 is also responsible for 
updating any information to the dictionaries upon requests from the MAE 303. In 
one embodiment, the DMM 305 also manages another database 304 where all of the 
rules or policies are stored. 

[0032] The virtual dictionary 306 may include direct translation of the hiragana 
characters to katakana characters. The virtual dictionary 306 may return all multiple 
words with different part of speeches. In one embodiment, the virtual dictionary 
may return three parts of speeches. They are noun, noun that can be used as verb and 
adjective. It is useful to note that artificially generated katakana words from the 
virtual dictionary look no different from regular words once they are returned from 
the virtual dictionary. 

[0033] In another embodiment, the dictionary database may be divided into two 
or more dictionaries. One of them is a regular dictionary containing regular words. 
The other dictionary is a special dictionary (e.g., so-called virtual dictionary). The 
special dictionary may contain all possible katakana characters including the 
artificial katakana characters created during the processing. The katakana is straight 
transliteration of the hiragana input. The virtual dictionary may return multiple 
words with different part of speeches. Each word has its priority value. Such 
priority value may be assigned by the virtual dictionary. For example, in the 
implementation for string "A-Ka-Ma-I", the dictionary may return three outputs with 
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different part of speeches, Noun, Noun that is associated with verb and adjective, as 
follows: 



A-Ka-Ma-I POS: Noun Priority: 100 

A-Ka-Ma-I POS: Noun that can work as a verb Priority: 100 

A-Ka-Ma-I POS: Adjective Priority: 100 

Other implementations may exist. 

[0034] Three words can be considered as one record, or they may be considered 
as three separate records. The priority value can be the same for all words returned 
from the dictionary. The priority value could be calculated from the katakana and/or 
the part of speeches. In one embodiment, the priority value is determined by the 
length of the word. In another embodiment, the priority may be based on bi-gram 
and tri-gram statistics of the katakana and can be adjusted based on the part of 
speeches. Typically the priority value is set lower than all or most of regular words 
in regular dictionaries, in order to prevent the artificial katakana words from 
appearing as the most probable conversion when there are proper regular words 
available. 

[0035] Part of speeches defines how often or easy words of a certain part-of- 
speech come next to the other words of certain part-of-speech. It could be just 
yes/no value. Subject to the implementation, there are cases one word has two part- 
of-speeches. One for the right side connection and the other for left side connection. 
Also, there are cases that it is also used to determine not only the next or previous 
words, but also the connection with words at before the previous or after the next. 
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[0036] Referring to Figure 3, the MAE 303 sends a request to DMM 305 to 
convert the inputted hiragana words. The DMM 305 searches the regular 
dictionaries 307 for corresponding Japanese words. At the mean while, the MAE 

303 then sends a request to DMM 303 to retrieve all possible katakana character 
combinations from the virtual dictionary 306. In general the MAE 303 will select 
the words from the regular dictionaries 307, if the dictionaries 307 contain such 
direct translation. Otherwise, the MAE 303 will select an artificial katakana word 
created by the virtual dictionary 306. 

[0037] The MAE 303 also invokes a set of rules from a database 304 and 
applies the set of rules to the analysis of all possible combinations. The database 

304 containing the rules may be a separate database, or it may be the same database 
as the dictionary 306 or 307. Each of the possible combinations is associated with a 
usage frequency. The usage frequency represents how frequent the characters are 
previously being used. The dictionary may also include a connection relationship 
between each character set (e.g., noun, adjective, and verb, etc.). The set of rules 
may include the information of usage frequency and connection relationship. The 
MAE 303 applies these rules to construct a possible candidate pool or list from the 
possible combination from the dictionary 306, based on the set of rule. In one 
embodiment, the set of rules may include semantic or grammar rules to construct 
the candidate list. For example, the word "hot" may mean hot temperature or mean 
spicy food. When the word "hot" is associated with the word "summer", e.g., "hot 
summer", the word "hot" means more like "hot temperature", rather then "spicy". 
The MAE 303 may calculate the cost values of the candidates based on the set of 
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rule. The final candidate may consist of the least cost value among the candidate 
list. 

[0038] Figure 4 shows an example of two candidates being constructed to 
represent the word of "San Jose", where each of them comprises a usage frequency. 
The first choice comprises character 401, 402 and second choice comprises 
character set 404. Character 404 is a particle. Character 401 has a usage frequency 
of fl and character 402 has a usage frequency £2. The particle character 403 has a 
usage frequency G. In addition, the connection between characters 401 and 402 is 
cl and c2 between characters 402 and 403. As a result, the cost value of the first 
choice may be: 

Cost Value l-fl+f2 + f3 + cl + c2 
Similarly, the second choice may have cost value of: 

Cost Value2 = fa + f3 + ca 
In one embodiment, the cost values may include semantic or grammar factors. The 
evaluation unit 303 evaluates the cost values of two choices and selects the one with 
the least cost value, in this case cost value 2, as a final output of the conversion. 
[0039] However, although the evaluation unit selects the final output based on 
least cost value and in most cases the selected outputs are correct, in some rare 
cases, the correct output may not has least cost value. Under the circumstances, the 
invention provides an opportunity for a user to interact. Figure 5 shows another 
embodiment of the present invention. Referring to Figure 5, the system 300 
provides a user interaction 309, where the user can examine the output generated by 
the MAE 303 and determine whether the output is correct. If the user decides the 



04860.P2714 



18 



Patent Application 



output is incorrect, the MAE 303 retrieves the candidate list from the database (e.g., 
virtual dictionary 306), through DMM 305, and displays the candidate list to a user 
interface. In one embodiment, the user interface may be a pop-up window. The 
user then can select the best choice (e.g., final choice) from the candidate list as an 
output. In a further embodiment, the output maybe transmitted to an application 
through an application programming interface (API), from which the application 
may select a final choice. 

[0040] In another embodiment, if the candidate list does not contain a correct 
output the user desires, the invention further provides means for user directly enters 
the final output manually and force the system to convert the hiragana characters to 
katakana characters. The system will update its database (e.g., virtual dictionary 
306 or regular dictionaries 307) to include the final output katakana word entered by 
the user as a future reference. In a further embodiment, the user may in fact modify 
the rules applied to the conversion and store the user specific rules in the database 
304. 

[0041] Figure 6A shows a block diagram of an embodiment of the invention. 
A Japanese hiragana character string 601, which has English meaning of "Watch a 
movie in San Jose", is inputted to the system. The morphological analysis engine 
(MAE) 604 will look up a database, such as dictionaries 307, to search 
corresponding Japanese words. The system transmits the portion 602 to the 
morphological analysis engine (MAE) 604, through a user interface 616. The MAE 
604 divides the input into a plurality of sub-strings and communicates with the 
dictionary management module (DMM) 608 and looks up dictionaries 606 for direct 
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translation for each sub-string. At the mean while, the DMM instructs the virtual 
dictionary 607 to create all possible katakana words corresponding to each sub- 
string. As a result, a pool of words 605 is formed with regular Japanese words from 
the regular dictionaries 606 and artificially created katakana words from the virtual 
dictionary 609. In one embodiment, each of those Japanese character strings 605 is 
associated with a usage frequency value and there is connection relationship 
information between each of the character set. In another embodiment, each of the 
character strings 605 is associated with a priority value. Typically the priorities of 
the artificially created katakana words are lower than the regular words from the 
regular dictionaries to prevent any confusion. That is, the system will pick the 
regular words from the regular dictionaries over the artificially created katakana 
words. The system utilizes the artificially created words only when there are no 
corresponding regular words from dictionaries 606. The priority information may 
be stored in the dictionaries 606 as well. Next, the MAE 604 evaluates and 
analyzes the character strings 605 and applies a set of rules from the database 607. 
Although database 607 and dictionary 606 are illustrated as separate databases, it 
would be appreciated that these two databases may be combined to form a single 
database. The MAE 604 constructs another set of character strings 610 from the 
character strings 605, based on the set of rules. The words 610 are considered as a 
candidate list, where the word with least cost value is considered higher priority, 
such as word 611, while the character set with high cost value, such as word 612, is 
considered lower priority. Other priority schemes may exist. Based on the 
candidate list, the MAE 604 selects a candidate with higher priority, such as 
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character strings613 as final target character string. The character string 613 is then 
applied to the rest of the character strings to form the final sentence 614. 
[0042] Figure 6B shows a method used by the invention against a conventional 
method. Referring to Figure 6B, a Japanese hiragana character string 651, which 
has an English meaning of "San Jose", is inputted through an input method. The 
input method normally divides the input into multiple sub-strings 652. For each of 
the multiple sub-strings, the dictionaries 653 are used to convert the sub-strings 652 
as much as possible into another set of Japanese words 654. The dictionaries 653 
normally contain most of the known words, such as word 663. However, in the case 
of word "San Jose", such as word 662, it is not known to the dictionaries. Thus, the 
dictionaries are not able to convert it, leaving the word 662 unavailable. A 
conventional method will perform analysis on the words 654, applies rules 664 
(e.g., grammar rules), and generates a candidate list 660. From which word 661 is 
selected as a final candidate, which is incorrect. As a result, a user must manually 
convert the input 651 to generate the correct conversion. 

[0043] The present invention introduces a virtual katakana dictionary 655. In 
addition to the conversion using regular dictionaries, the virtual dictionary 655 takes 
the sub-strings 652 and creates a set of corresponding artificial katakana words 656. 
By combining the regular words 654 from the dictionaries 653 and the artificial 
katakana words 656 generated from the virtual dictionary 655 and applying set of 
rules, the full set of words 658 corresponding to the sub-strings are created. As a 
result, each of the sub-strings has its corresponding converted string, which may be 
a regular Japanese word, such as word 663, or an artificial katakana word. The 
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invention then creates a candidate list 658 based on the set of rules 657. Each of the 
candidates is associated with a priority based on the rules. From the candidate list 
the word with highest priority is selected as final correct candidate 659. 
[0044] Figure 7 shows a method of an embodiment of the invention. Referring 
to Figures 6A and 7, the method starts with inputting 701 Japanese hiragana 
characters, such as hiragana character string 601. It divides 702 the hiragana 
character string into multiple sub-strings and converts 708 each of the sub-strings 
into Japanese words through a dictionary, such as dictionaries 606. At the mean 
while, the method creates 703 all possible katakana character strings related to the 
input, through the virtual dictionary 609. A pool of the Japanese words 605 is 
formed from both the regular words and artificial katakana words. It then constructs 
704 a candidate list 610, wherein the candidate with lower cost value has higher 
priority while the candidate with higher cost value has lower priority. The priority 
of the artificially created katakana words may be assigned by the virtual dictionary. 
The method then analyzes 705 the candidate list and selects 707 the best candidate 
613 (e.g., lowest cost value) based on the analysis. The final candidate is then 
outputted 708 to form the final sentence 614. 

[0045] Figure 8 shows another embodiment of the invention, where the 
invention may involve a user interaction. The input 601 contains Japanese hiragana 
character string where portion 602 (e.g., "San Jose") cannot be directly converted, 
while portion 603 can be converted through regular dictionaries 606. The system 
then uses virtual dictionary 609 to create all possible corresponding katakana words 
for every single sub-strings of portion 602. The morphological analysis engine 
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(MAE) 604 then constructs a candidate list 610 based on a set of rules. The set of 
rules may include character's usage frequency and connection relationship 
information between the characters. In another embodiment, the set of rules may 
contain semantic and grammar rules. A cost value is calculated for each candidate 
of the list. The candidate with the least cost value has highest priority, while the 
candidate with the most cost value has lower priority. As shown in Figure 8, 
candidate 61 1 has highest priority among the candidates in the list. As a result, 
candidate 61 1 is selected as a final choice for the conversion by the evaluation unit 
609. However, in some rare cases, the choice 61 1 may not be correct, in which 
case, it involves a user interaction 615. During the user interaction, the user selects 
portion of the input, such as portion 602 which has an English meaning of "San 
Jose" and instructs the system to convert it. The system will pull out the pool of all 
candidates, such as candidate list 610. In one embodiment, the candidate list is 
displayed through user interface, such as a pop-up window. From the list, the user 
selects the final output 616 and forms the final sentence 614. Based on the user's 
selection, the system may update its database (e.g., dictionaries 606 and virtual 
dictionary 609), so that subsequent conversion will most likely succeed. 
[0046] Figure 9 shows a method of another embodiment of the invention, 
converting a source character string to a target character string. The method 
receives a first character string having the source character string from a user 
interface. It divides the first character string into multiple sub-strings. It then 
converts the sub-strings to second character strings through a dictionary. At the 
same time, the method creates third character strings corresponding to the sub- 
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strings through a virtual dictionary. It then analyzes the second and third character 
strings and constructs fourth character strings based on the analysis. Next it creates 
a candidate list based on the priority information and selects the final candidate with 
the highest priority from the candidate list. 

[0047] Referring to Figure 9, Japanese hiragana character string is received 901 
through a user interface, such as keyboard. In one embodiment, the user interface 
may include a touch pad from a palm pilot, or other inputting devices. In a further 
embodiment, the Japanese hiragana character string may be received from an 
application software through an application programming interface (API). The 
hiragana character string is divided 902 into multiple sub-strings. The 
morphological analysis engine (MAE) then communicates with the dictionary 
management module (DMM) to convert 903 each of the sub-strings into 
corresponding Japanese words through regular dictionaries. At the same time, the 
MAE also instructs DMM to create 904 all possible katakana words corresponding 
to the sub-strings through the virtual dictionary. Next the system constructs 905 
possible candidates from the all possible words including Japanese words from the 
regular dictionaries and artificial katakana words generated from the virtual 
dictionary, and forms a candidate list. The possible choices of the katakana words 
from the virtual dictionary may include part speech information. The system may 
use a set of rules to construct the candidates. In one embodiment, the set if rules 
may include usage frequency of each katakana character set and connection 
relationship between each choice. In another embodiment, the set of rules may 
include word's semantic or grammar rules. This information may be stored in the 
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database where the all possible katakana character sets are stored. In another 
embodiment, these rules may be stored in a separate database. The system next 
retrieves 906 the usage frequency and connection relationships from the database, 
and applies 907 the semantic and grammar rules to the analysis. Based on this 
information, the system calculates 908 cost values for all candidates. The candidate 
with least cost value is then selected 909 as final target character set. The final 
target character set may be displayed to a user interface in a display device. 
[0048] In a yet another aspect of the invention, a user may inspect 910 the result 
provided by the kana-kanji conversion engine to check 911 whether the conversion 
is correct. If the user is satisfied with the result, the conversion is done. However, 
if the conversion is incorrect, the user selects 912 the portion of the input (e.g., 
original hiragana input) and instruct the system to explicitly convert it. The system 
in turn provides all possible combination of Japanese words including the artificial 
katakana words, in a form of candidate list. The user then retrieves 913 such 
candidate list and display in a user interface. In one embodiment, the user interface 
is in a form of pop-up window. Next, the user may check 914 whether the 
candidate list contain the correct conversion. If the candidate list contains the 
correct conversion, the user selects 915 the best candidate from the candidate list. 
The system then updates 916 its database (e.g., knowledge bases) of the parameters 
(e.g., usage frequency, connection relationship, etc.) regarding to the user selection. 
The final selection is then outputted 917 to the application. In one embodiment, if 
the candidate list does not contain the correct result, the user may construct 918 and 
create the correct result manually through a user interface. Once the artificial 
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conversion is created by the user, the system saves 919 such results in its database 
as a future reference. 

[0049] In the foregoing specification, the invention has been described with 
reference to specific exemplary embodiments thereof. It will be evident that various 
modifications may be made thereto without departing from the broader spirit and 
scope of the invention as set fourth in the following claims. The specification and 
drawings are, accordingly, to be regarded in an illustrative sense rather than a 
restrictive sense. 
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